Speech Errors on Frequently Observed Homophones in French: Perceptual Evaluation vs Automatic Classification
نویسندگان
چکیده
The present contribution aims at increasing our understanding of automatic speech recognition (ASR) errors involving frequent homophone or almost homophone words by confronting them to perceptual results. The long-term aim is to improve acoustic modelling of these items to reduce automatic transcription errors. A first question of interest is whether homophone words such as et, (and) and est (to be), for which ASR systems rely on language model weights, can be discriminated in a perceptual transcription test with similar n-gram constraints. A second question concerns the acoustic separability of the two homophone words using appropriate acoustic and prosodic attributes. The perceptual test reveals that even though automatic and perceptual errors correlate positively, human listeners in conditions attempting to approximate the information available for decision for a 4-gram language model deal with local ambiguity more efficiently than ASR systems. The corresponding acoustic analysis shows that the homophone words may be distinguished thanks to relevant acoustic and prosodic attributes. A first experiment in automatic classification of the two words using data mining techniques highlights the role of the prosodic (duration and voicing) and contextual information (co-occurrence of pauses). Preliminary results suggests that additional levels of information may be considered in order to efficiently represent and factorize the word variants observed in speech and to improve the automatic speech transcription.
منابع مشابه
A perceptual investigation of speech transcription errors involving frequent near-homophones in French and american English
This article compares the errors made by automatic speech recognizers to those made by humans for near-homophones in American English and French. This exploratory study focuses on the impact of limited word context and the potential resulting ambiguities for automatic speech recognition (ASR) systems and human listeners. Perceptual experiments using 7-gram chunks centered on incorrect or correc...
متن کاملCross-Lingual Study of ASR Errors: On the Role of the Context in Human Perception of Near-Homophones
It is widely acknowledged that human listeners significantly outperform machines when it comes to transcribing speech. This paper presents a paradigm for perceptual experiments that aims to increase our understanding of human and automatic speech recognition errors. The role of the context length is investigated through perceptual recovery of small homophonic words or near-homophones yielding f...
متن کاملEvaluation of a Phone-Based Anomaly Detection Approach for Dysarthric Speech
Perceptual evaluation is still the most common method in clinical practice for the diagnosing and the following of the condition progression of people with speech disorders. Many automatic approaches were proposed to provide objective tools to deal with speech disorders and help professionals in the severity evaluation of speech impairments. This paper investigates an automatic phone-based anom...
متن کاملCross-lingual studies of ASR errors: paradigms for perceptual evaluations
It is well-known that human listeners significantly outperform machines when it comes to transcribing speech. This paper presents a progress report of the joint research in the automatic vs human speech transcription and of the perceptual experiments developed at LIMSI that aims to increase our understanding of automatic speech recognition errors. Two paradigms are described here in which human...
متن کاملHow are word-final schwas different in the north and south of france?
The aim of this paper is twofold: (i) give a large-scale description in realized word-final schwas of French lexical words for different regions (North vs. South) and different speaking styles (read vs. spontaneous speech); (ii) highlight differences in prosodic features and test these differences via automatic classification techniques. The proposed study relies on a subset of 12.5 hours of th...
متن کامل